The effect of mobility reductions on infection growth is quadratic in many cases

Stay-at-home orders were introduced in many countries during the COVID-19 pandemic, limiting the time people spent outside their home and the attendance of gatherings. In this study, we argue from a theoretical model that in many cases the effect of such stay-at-home orders on incidence growth should be quadratic, and that this statement should also hold beyond COVID-19. That is, a reduction of the out-of-home duration to, say, 70% of its original value should reduce incidence growth and thus the effective R-value to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$70\% \cdot 70\% = 49\%$$\end{document}70%·70%=49% of its original value. We then show that this hypothesis can be substantiated from data acquired during the COVID-19 pandemic by using a multiple regression model to fit a combination of the quadratic out-of-home duration and temperature to the COVID-19 growth multiplier. We finally demonstrate that many other models, when brought to the same scale, give similar reductions of the effective R-value, but that none of these models extend plausibly to an out-of-home duration of zero.

In the simulation runs, turnout α (∈ [0.1, 1]) and infection probability π (∈ {0.01, 0.02, 0.03, 0.04, 0.05}) are varied.For each combination of turnout and infection probability we run 100 simulations; in each simulation, patient zero, who is infected at the beginning of the simulation, is newly chosen.Furthermore, as the infectious agents also follows the reduced office attendance, α also denotes probability with which patient zero will be at the office.The expected no. of infected individuals for different values of α is depicted in Figure 1

Relationship of COVID-19 and Mobility in Germany
As noted in Section "Relationship between COVID-19 and mobility in Germany", multiple studies investigate the reduction of mobility in Germany related to the COVID-19 pandemic: Conducting an analysis of movement records from mobile phones, [3] found that lockdown measures caused substantial and long-lasting structural changes in mobility, noting a heterogeneous spatial distribution of reductions (increased reduction in large cities compared to less densely populated areas) and the largest reduction for mid-March 2020.By analyzing the Google mobility data [4] for the federal state of Saxony until July 2020, [5] found an overall decline in mobility for all commercial activities and an increase for parks and residential locations.Additionally, using the three mobility categories (transit, walking driving) of the Apple mobility data [6], [7] found a negative correlation between mobility and confirmed case numbers in Germany.
Using survey data from the first wave of the COVID-19 pandemic in Germany, [8] found the most significant mobility reduction for recreational trips followed by work trips.Here, public transit displays the largest decrease in share, in terms of modal split.
[9] carried out a self-assessment survey, noting that the share of the adult population which had public transport in their mode choice set had dropped by almost half during the lockdown of spring 2020.Finally, concentrating on the rural population and using data from interviews and a survey, [10] found that a considerable share of respondents stated to not have changed their mobility behavior, and that in consequence insights from urban settings cannot be directly transferred to rural areas.
Returning to the question of mode choice, [9] investigated the car usage in Germany before (spring 2019) and during (spring 2020) the pandemic, finding an effect as overall mileage decreased (magnitude of decrease depended on household characteristic, age and cubic capacity of the car), but noting that at the time of writing it was unclear whether or not these changes will turn out to be structural.
Furthermore, [11] computed the change in general mobility behavior at the Germany county level in January 2021 compared to the same period of the previous year, finding that mobility changes show significant socioeconomic heterogeneity.Contrarily, analyzing continuous location (provided via a smartphone app, from beginning of March until mid-May 2020) in combination with demographic characteristics, [12] found that the German population reduced its mobility in a rather consistent and uniform way.They found no evidence for statistically significant period-by-subgroup interactions.In a follow-up research letter, [13] additionally analyzed the reduction in daily distance travelled by an individual person up until June 2021, again finding that the pattern of relative reduction was similar in all age groups and in both sexes, with a slightly higher relative mobility in the younger age group in summer 2020 and the last month of observation.

Comparison of Mobility Data Sources
In this work, we have used the out-of-home duration by Senozon [14].Alternative data sources are the Google COVID-19 Community Mobility Reports.The google mobility reports provide mobility data for six different categories, including "residential".Mobility data is presented relative to a baseline value.For each day of the week, the baseline value is the median value from the 5-week period from January 3rd until February 6th, 2020.Assuming that an average person in Germany spends around 15.7 hours per day at home [15], the relative numbers for the category "residential" can be converted to absolute numbers, i.e. the amount of time people spent at home.From the absolute amount of time spent at home, one can then infer the amount of time people did not spent at home.This is then comparable to the number we have used above.Please note that when trying to apply our method to other nations, another assumption for the baseline out-of-home duration might be necessary.

Model Comparison (Schematic Description)
To settle on the final model ( 12), we employ a multi-step model comparison (see also Figure 3): • In the first step, we consider mobility-only models (see Section "Mobility only models" for details), to determine if the influence of mobility on the growth multiplier was linear, quadratic or cubic.
• Then, we compare the intercept and no-intercept model arguing for the exclusion of the intercept (again, see Section "Mobility only models").
• In the third step, we compare multiple weather variables: the weekly average of the daily maximum temperature Tmax, the weekly average of the daily average temperature Tavg, the weekly average of the daily total precipitation Precip, and the outdoor fraction outFrac.Model outputs and detailed discussion of this step can be found in the Supplementary Section 1.5.
• In the final step, we exclude the linear term for outFrac and only include the interaction term of out-of-home duration and outFrac (see Section "Models including outdoor fraction").

Model Comparison (Model Outputs)
Step 1 -Mobility only models In a first step, we compare multiple mobility only models: Here, the growth multiplier is denoted by G t = I t+1 /I t , where I t is the weekly average incidence, t indexes the weeks, the out-of-home duration is denoted by D t , and for which m ∈ {1, 2, 3}.Based on the arguments detailed in Section "Mobility only models", we decide on the quadratic model (m = 2).For details and model outputs, see Section "Mobility only models".
Step 2 -Intercept vs. no Intercept In the second step, we compare the quadratic intercept and no-intercept model: Again, the model outputs and our reasoning for continuing with the no-intercept model (1) are provided in Section "Mobility only models". Step

-Comparison of weather variables
As noted in Section "Models including outdoor fraction", the inclusion of the share of activities performed outside (outFrac) instead of temperature itself, is based on our previous modelling work.To additionally support this variable transformation, we set up analogous models to equation ( 11), but with the weekly average of the daily maximum temperature Tmax, or the weekly average of the daily average temperature Tavg instead of outFrac.For Tmax, this model reads Analogously, for Tavg the corresponding model reads which leads to the output found in For comparison, the model for the outdoor fraction outFrac, which we presented in Section "Models including outdoor fraction", reads and its result can be found in Table 3.Besides temperature (and its transformation outFrac), it is possible that precipitation influences behavior and therefore, indirectly, the growth multiplier G.We consider a model that includes the variable precipitation (Precip) and its interaction with the squared out-of-home duration D 2 .For Tmax, this model reads For Tavg, the corresponding model reads with the output found in Table 5.Finally, for the outdoor fraction outFrac, the model reads and its output can be found in Table 6.Step 4 -Only considering weather in combination with mobility In Section "Models including outdoor fraction", we highlighted statistically and theoretically motivated reasons for excluding the linear term outFrac (see Section "Models including outdoor fraction" for details).The model from the aforementioned section was and had the output shown in We note that considering the out-of-home duration up to degree three does not improve the model fit (see Table 2 for regression output for linear, quadratic, and cubic mobility-only model).Consequently, we endorse the models including only a single mobility term.Furthermore, we want to point out that the model considering the linear, quadratic, and cubic term simultaneously is also considered and dismissed in the best subset selection in Section 1.8.1.

Model Comparison (Determination of Lag)
To determine that the lag between growth multiplier and out-of-home-duration is 2 weeks, we considered every model of every step of Figure 3 with a lag of 0 to 4 weeks.When considering the mobility-only models (with or without intercept), a lag of three weeks leads to the best model fit.However, after the inclusion of a weather variable, a 2-week lead is favored.Summary statistics can be found in the following Table 9: Step  3. 3 ⋆ : All models of step 3 of Figure 3 were compared for a lag of 0-4 weeks.However, for all lags the model including the outdoor fraction outFrac was favored and for reasons of clarity, only the summary statistics of the best-fitting model are given in the table.
1.8 Model selection via Best subset selection, Lasso regularization, elastic net regularization In Sections "Mobility only models" and "Models including outdoor fraction" (and in Supplementary material 1.5), theoretical assumptions as well as visual results (see Figure 5) guided the model selection.Complementary, in this subsection we show that using three different variable selection methods, namely best subset selection, Lasso regularization, and elastic net regularization, supports our conclusion.
Due to the arguments presented in Sections "Mobility only models" and "Models including outdoor fraction", we consider the mobility variable D up the degree three, the outdoor fraction outFrac up to degree three and every combination of them up until degree 3, leaving us with 9 predictors (D, D 2 , D 3 , outFrac, outFrac 2 , outFrac 3 , D×outFrac, D 2 ×outFrac, D×outFrac 2 ), and 2 9 models to choose from.Furthermore, due to the reasoning presented in Section "Mobility only models", we enforce that the intercept is equal to 0.

Best subset selection
Best subset selection aims to find a small subset of predictors for a linear model.When performing best subset selection, a separate least squares regression model must be fit for each possible combination the predictor variables.For every possible subset size of variables, the best subset selection chooses the model with the minimal residual sum of squares (RSS).A single best model can then chosen by using one of the following three statistics: Bayesian Information Criterion (BIC), Mallow's C p or adjusted R 2 .Here, we consider all three of them, to choose a model which best satisfies our needs.In addition, we enforce that the maximum number of independent variables is 3.
BIC: Comparing the 2 9 possible models, the model with the smallest BIC (see Figure 4) is Mallow's C p : Among all models, the model with the smallest C p (see Figure 5 is the following one: Adjusted R 2 : And finally, among all 2 9 , the model with the largest adjusted R 2 (see Figure 6 is as follows:

AdjrR^2
Figure 6: Adjusted R 2 for the different subset sizes.The maximal adjusted R 2 is equal to 0.6562 and reached, when the subset size is equal to 3.However, already when the subset size is equal to 2, we obtain an adjusted R 2 of 0.6345.

Lasso regularization
Lasso is a regression method which both variable selection and regularization with the aim of improving interpretability and model accuracy.In our case, Lasso regularization favors the model

Elastic net regularization
Elastic net regularization is a penalized linear regression model combining both L 1 and L 2 penalties.In consequence, elastic net regularization can be viewed as a combination of Ridge and Lasso regression.As the degree of mixing α we chose α = 0.5, and we chose the shrinkage parameter using 10-fold cross validation.The cross validation was performed 100 times, giving us a distribution of λs.Both mean(λ) and median(λ) favor the model

Conclusion
Summarizing the results of the best subset selection, the lasso regularization and the elastic net regularization above, we observe that the different variable selection methods favor different models.However, we note that all three methods (as well as the 3 different criteria when applying best subset selection) select D 2 • outFrac as an independent variable and lasso regularization, elastic net regularization as well as BIC (when applying best subset selection) also include D 2 .Contrarily, Mallow's Cp and the adjusted R 2 favor D 3 as the mobility-only variable.Furthermore, when deciding based on BIC in best subset selection, the model is chosen, which is the model we have already favored in Section "Models including outdoor fraction".By Occam's razor, and as best subset selection is the preferable method in this case, we decide on

Model diagnostics
In order to assess the adequacy of the final Model (12), several model diagnostics were conducted.Figure 7 presents a scatter plot of the residuals against the predicted values from the model (top) as well as a scatter plot of the square root of the absolute value of the standardized residuals against the predicted values from the model (bottom).According to the assumption of homoscedasticity, we expect both panels to exhibit a random scatter of points around zero.However, the rightmost portion of the top panel suggests the presence of heteroscedasticity and a potential non-linear relationship.To further investigate this, a Breusch-Pagan test was performed, which examines the presence of heteroscedasticity in the residuals.The test yields a p-value of 0.1479 (> 0.05), indicating a lack of evidence for heteroscedasticity.Altogether, the model diagnostics as well as Figure 3a and Figure 3b indicate that the model generally seems appropriate for the relationship that we aim to model, but three outliers at the beginning of the second wave are not fully explained by the proposed model, which were discussed elaborately in the Results section.

Comparison of Our Model to Models in the Literature
As address in Section "Comparison of our model to models in the literature", we compare our model values to values in the literature whenever possible.This is not possible for the publications that report the quality of the fit but omit information necessary to reconstruct the full model (see Section "Relationship between COVID-19 and mobility world-wide").However, the following four papers provide the necessary coefficients and can thus be compared to our model.
To compare the influence of mobility reduction across models, we choose the following scenario: For all comparisons, we assume that the out-of-home duration is decreased by 40%.The average German out-of-home duration is around 8 hours per day (see Supplementary Section 1.3), leading to an average at-home duration of about 16 hours per day.Reducing the out-of-home duration by −40%, or −3.2 hrs, to 4.8 hrs thus corresponds to increasing the at-home-duration by 20%.
Some of the models we compare with are linear, and for these models ∆R can be obtained from the slope coefficient without having to determine the intercept.In the following, we thus compare with our ∆R, which is between 0.6 (warm summer) and 0.75 (cold winter).
Comparison with [19] [19] use where m t,i denotes the remaining fraction of mobility compared to a baseline, expressed as a value between 0 and 1.The remaining mobility fraction m t,i is based on various categories from Google [4] or Apple [6] mobility data.The percentage changes must then be divided by 100 to obtain m t,i .When using a combination of google and apple mobility data, the authors obtain β i = 1.51 and R 0,i = 0.97 for Germany during the period from late April to late October 2020.Using our introduced example, namely a reduction of the out-of-home duration of 40% (= 0.4), we obtain Comparison with [20] [20] analyze the correlation between effective reproduction number and mobility.For the "residential" category, this leads to where R t denotes the effective reproduction number and h t denotes the percentage change of time spent at home as introduced above.For a 14-day lag (τ = 14) they compute a coefficient of β r = −0.021for Germany.Hence, for a 20% increase of the at-home duration, their model computes a reduction of the effective R-value by −0.021 • 20 = −0.42.Again, this is somewhat smaller, but still consistent with our own range of [−0.75, −0.6].

Comparison with [21]
[21] uses a fixed-effects model that controls for state-level effects to explore the relationship between Google mobility data and R t on US state level.For the "residential" category, the model reads where β 0,state denotes a state-specific intercept, h t denotes the percentage change of the "residential" category, and days t is a trend variable denoting the days since the start of the pandemic in the state.He obtains β r = −0.0236and β d = −0.0020when considering a 14-day lag (τ = 14) between mobility and the effective R-value.The values of β 0,state are not given in the paper; to compare with our example one can set h t−τ and days t to zero and then use β 0,state = log R 0 = log(1.4).For our example of increasing the at-home-duration, we have h t−τ = 20, and thus The trend variable days leads to even smaller reductions over time, but after a year, days = 365 even with h t−τ = 0 leads to an R t < 1, which is clearly not plausible.Indeed, the model was estimated against data until June 23rd, 2020, and it is plausible that the trend term, at least in part, rather captures the seasonal effect.
Comparison with [22] For the "residential" category, [22] obtain the following model: where h denotes the percentage change of the "residential" category.For the 14-day lag (τ = 14) the coefficient differs between the considered five public health units in the Greater Toronto Area and takes on values between −0.035 and −0.018.

Figure 1 :
Figure 1: Expected no. of infected individuals vs. share of attendees α after 2000 time steps.

Figure 2 :
Figure 2: Daily out of home duration per person (in hours) for 2020.Comparison of Senozon and Google COVID-19 Community Mobility Reports data.

Figure 3 :
Figure 3: Multi-step model comparison to determine the final model (12).

Figure
Figure BIC for the different subset sizes.

Figure 5 :
Figure 5: Mallow's Cp for the different subset sizes.

Figure 8 (
Figure 8 (top)  shows the model residual quantiles vs. the theoretical quantiles if the residuals were to follow a normal distribution.A slight deviation can be noticed at the extreme ends, while the vast majority of the model quantiles aligns well with the theoretical quantiles.Finally, we use Cook's distance to detect observations that strongly influence fitted values of the model, which can be seen in Figure8.Observations with a Cook's distance larger than the threshold of 4/n, where n is the number of observations, are traditionally deemed outliers.With n = 40, three observations pass this threshold: Observation number 33 (corresponding to 18-October-2020) displays the largest Cook's distance (1.436696e − 01), observation 34 (corresponding to 25-October-2020) displays the second largest Cook's distance (1.381654e − 01), and observation 32 (corresponding to 11-October-2020) displays the third-largest cook's distance (1.092919e − 01).

Table 1 :
Tmax t and creates the output found in the following table: Model output, out-of-home duration D and temperature Tmax lead by 2 weeks.

Table 2 :
Model output, out-of-home duration D and temperature Tavg lead by 2 weeks.

Table 3 :
Model output of (2), out-of-home duration D and outdoor fraction outFrac lead by 2 weeks.

Table 4 :
Precip t and produces the model output found in Table 4. Model output, out-of-home duration D, temperature Tmax, and precipitation Precip lead by 2 weeks.

Table 5 :
Model output, out-of-home duration D, temperature Tavg, and precipitation Precip lead by 2 weeks.

Table 6 :
Model output, out-of-home duration D, outdoor fraction outFrac, and precipitation Precip lead by 2 weeks.We note that model (3) has the highest adjusted R 2 (0.6502), directly followed by model (2) (0.6345).However, for model (2), both AIC and BIC display lower values.Consequently, we decide to continue with model (2).

Table 7 :
Model output for final model, out-of-home duration D and outdoor fraction outFrac lead by 2 weeks.In addition to considering the linear, quadratic, and cubic out-of-home duration separately, we also considered them simultaneously using orthogonal polynomial regression.The model output can be found in Table8.

Table 8 :
Model output for model including linear, quadratic, and cubic out-of-home duration.The out-of-home duration leads by 2 weeks.

Table 9 :
Regression output for different lags for the different steps of the model comparison illustrated in Figure